Optimal compression of hash-origin prefix trees

نویسنده

  • Jarek Duda
چکیده

There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix tree which distinguish elements turns out to require asymptotically only about 2.77544 bits per element, while standard approaches use a few times more. While being certain of working inside the database, the cost of distinguishability can be reduced further to about 2.33275 bits per elements. Increasing minimal depth of nodes to reduce probability of false positives leads to simple relation with average depth of such random tree, which is asymptotically larger by about 1.33275 bits than lg(n) of the perfect binary tree. This asymptotic case can be also seen as a way to optimally encode n large unordered numbers saving lg(n!) bits of information about their ordering, which can be the major part of contained information. This ability itself allows to reduce memory requirements even to ln(2) ≈ 0.693 of required in Bloom filter for the same false positive probability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tabu-Based Cache to Improve Range Queries on Prefix Trees

Distributed Hash Tables (DHTs) provide the substrate to build large scale distributed applications over Peerto-Peer networks. A major limitation of DHTs is that they only support exact-match queries. In order to offer range queries over a DHT it is necessary to build additional indexing structures. Prefix-based indexes, such as Prefix Hash Tree (PHT), are interesting approaches for building dis...

متن کامل

Provable Chosen-Target-Forced-Midfix Preimage Resistance

This paper deals with definitional aspects of the herding attack of Kelsey and Kohno, and investigates the provable security of several hash functions against herding attacks. Firstly, we define the notion of chosen-target-forced-midfix (CTFM) as a generalization of the classical herding (chosen-target-forced-prefix) attack to the cases where the challenge message is not only a prefix but may a...

متن کامل

Efficient Pseudorandom-Function Modes of a Block-Cipher-Based Hash Function

This article discusses the provable security of pseudorandom-function (PRF) modes of an iterated hash function using a block cipher. The iterated hash function uses the Matyas-Meyer-Oseas (MMO) mode for the compression function and the Merkle-Damgård with a permutation (MDP) for the domain extension transform. It is shown that the keyed-via-IV mode and the key-prefix mode of the iterated hash f...

متن کامل

Efficient In-Memory Indexing with Generalized Prefix Trees

Efficient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used—with minor changes—as primary or secondary indices in almost every DBMS...

متن کامل

Optimal Maximal Prefix Coding and Huffman Coding

Huffman coding has been widely used in data, image, and video compression. Novel maximal prefix coding different from the Huffman coding is introduced. Relationships between the Huffman coding and optimal maximal prefix coding are discussed. We show that all Huffman coding schemes are maximal prefix coding schemes and have the shortest average code word length among maximal prefix coding scheme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1206.4555  شماره 

صفحات  -

تاریخ انتشار 2012